Sam McLeod

mentions 1 type Person feed RSS

// recent coverage 1 mentions

22:01

2026-06-23

pub.towardsai.net

large-language-models

A GPU-Poor’s Guide to Local LLM Inference in 2026

A 35-billion-parameter Mixture-of-Experts model runs at 28 tokens per second with full 128K context on a 2019 gaming laptop with a GTX 1660 Ti and 6 GB VRAM using llama.cpp's --n-cpu-moe flag and Turb…

// co-occurs with top 7 entities

llama.cpp 1 Qwen3.6 35B-A3B 1 GTX 1660 Ti 1 RTX 3060 1 RTX 4060 1 Mac Studio 1 Turboquant 1